Managing character mappings

During typical data indexing, diacritics are normalized; for example, ü and u are equivalent. In some cases, however, it may be preferable to map a diacritic character to some other character or diphthong. For example, in German the U-umlaut (ü) can also be represented by “ue”, making Müller and Mueller equivalent, but not the same as Muller.

Similarly, users may type “ss” instead of ß (the German eszett or sharp S) because it’s easier to input with a standard keyboard, but users expect to find words with “ß “or “ss” not just a single “s”.

Character Mapping lets you create mappings for special cases like these. When you create a mapping, the character you map will be normalized both during indexing and also when users input search terms.

For example, when you map ü to ue, any ü in your data will be indexed as “ue” instead of “u”. When users enter a term such as “Müller” with the U-umlaut, it will be normalized to “ue” and will return results for Müller and Mueller. Likewise, when users enter “Mueller” they will find both Müller and Mueller. While “Muller” may also be returned due to fuzzy matching, it will not be an exact match; it will have a lower relevance and appear after exact matches.

Character mappings apply to all data sources in your Portfolio instance.

You must re-index all data sources after making changes to character mappings. (You do not need to re-harvest or re-import data; only a re-index is needed.) For more information about running or scheduling a search source task, see Scheduling tasks.

Character mappings are cached with other search settings. After a re-index of your data, you must also refresh search setting caches by clicking Refresh Cache (see Refreshing the search cache).

Until both the caches and indexes are refreshed, you may not get the results you expect. For example, if you map ü to ue and caches are refreshed before indexing is complete, a search for Müller will match Mueller exactly, but not Müller because the mapping has not been indexed. Müller may still appear in the results due to fuzzy matching, but will not be an exact match.

If you choose to display the Excerpt field in your search results, note that the mapped form of the character will display instead of the original. For example, if you map ß to ss, when a user searches for groß, the search will return results for groß and gross. Where the original data was groß, the highlighted excerpt will display gross. Other fields will display the original.

Character mapping is most commonly used to map a single character to a character pair. It is not necessary to map all diacritic characters to normal characters. By default, the indexing engine already normalizes diacritic characters.

If you need to preserve diacritics that are essential to the meaning of the words (for example, in Swedish Anden and Ånden are very different words), do not use character mappings. Instead, contact SirsiDynix to configure diacritic support for your search indexing.

For information about adding or removing mappings, see these topics: